Using Category-Based Adherence to Cluster Market-Basket Data
نویسندگان
چکیده
In this paper, we devise an efficient algorithm for clustering market-basket data. Different from those of the traditional data, the features of market-basket data are known to be of high dimensionality, sparsity, and with massive outliers. Clustering transactions across different levels of the taxonomy is of great importance for marketing strategies as well as for the result representation of the clustering techniques for market-basket data. In view of the features of market-basket data, we devise in this paper a novel measurement, called the category-based adherence, and utilize this measurement to perform the clustering. The distance of an item to a given cluster is defined as the number of links between this item and its nearest large node in the taxonomy tree where a large node is an item (i.e., leaf) or a category (i.e., internal) node whose occurrence count exceeds a given threshold. The category-based adherence of a transaction to a cluster is then defined as the average distance of the items in this transaction to that cluster. With this category-based adherence measurement, we develop an efficient clustering algorithm, called algorithm CBA (standing for Category-Based Adherence), for marketbasket data with the objective to minimize the categorybased adherence. A validation model based on Information Gain (IG) is also devised to assess the quality of clustering for market-basket data. As validated by both real and synthetic datasets, it is shown by our experimental results, with the taxonomy information, algorithm CBA devised in this paper significantly outperforms the prior works in both the execution efficiency and the clustering quality for marketbasket data.
منابع مشابه
A Combined Approach for Segment-Specific Analysis of Market Basket Data
There are two main research traditions for analyzing market basket data that exist more or less independently from each other, namely exploratory and explanatory model types. Exploratory approaches are restricted to the task of discovering cross-category interrelationships and provide marketing managers with only very limited recommendations regarding decision making. The latter type of models ...
متن کاملMarket Basket Analysis Visualization On A Spherical Surface
This paper discusses the visualization of the relationships in e-commerce transactions. To date, many practical research projects have shown the usefulness of a physics-based mass-spring technique to layout data items with close relationships on a graph. We describe a market basket analysis visualization system (MAV) using this technique. This system is described as the following: (1) integrate...
متن کاملAn Efficient Clustering Algorithm for Market Basket Data Based on Small Large Ratios
In this paper, we devise an efficient algorithm for clustering market-basket data items. In view of the nature of clustering market basket data, we devise in this paper a novel measurement, called the small-large (abbreviated as SL) ratio, and utilize this ratio to perform the clustering. With this SL ratio measurement, we develop an efficient clustering algorithm for data items to minimize the...
متن کاملA Dynamic Analysis of Market Efficiency on Benchmark Crude oil markets: Based on the Adaptive Market Hypothesis
This paper examines the applicability of the adaptive market hypothesis (AMH) as an evolutionary alternative to the efficient market hypothesis (EMH) by studying daily returns on the three benchmark crude oils. The data coverage of daily returns is from January 2th 2003 to March 5th 2018. In this paper, two different tests in the form of two distinguished classes (linear and nonlinear) have bee...
متن کاملAssociated Map and Inter-Purchase Time Model for Multiple-Category Products
The continued rise of e-commerce is the main driver of the rapid growth of global online purchase. Consumers can nearly buy everything they want at one occasion through online shopping. The purchase behavior models which focus on single product category are insufficient to describe online shopping behavior. Therefore, analysis of multi-category purchase gets more and more popular. For example, ...
متن کامل